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Abstract 

Designing and implementing typed programming languages is 
hard. Every new type system feature requires extending the metathe- 
ory and implementation, which are often complicated and fragile. 
To ease this process, we would like to provide general mechanisms 
that subsume many different features. 

In modern type systems, parametric polymorphism is funda- 
mental, but intersection polymorphism has gained little traction in 
programming languages. Most practical intersection type systems 
have supported only refinement intersections, which increase the 
expressiveness of types (more precise properties can be checked) 
without altering the expressiveness of terms; refinement intersec- 
tions can simply be erased during compilation. In contrast, unre- 
stricted intersections increase the expressiveness of terms, and can 
be used to encode diverse language features, promising an economy 
of both theory and implementation. 

We describe a foundation for compiling unrestricted intersec- 
tion and union types: an elaboration type system that generates or- 
dinary A-calculus terms. The key feature is a Forsythe-like merge 
construct. With this construct, not all reductions of the source 
program preserve types; however, we prove that ordinary call-by- 
value evaluation of the elaborated program corresponds to a type- 
preserving evaluation of the source program. 

We also describe a prototype implementation and applications 
of unrestricted intersections and unions: records, operator overload- 
ing, and simulating dynamic typing. 

Categories and Subject Descriptors F.3.3 [Mathematical Logic and For- 
mal Languages] : Studies of Program Constructs — Type structure 

Keywords intersection types 

1. Introduction 

In type systems, parametric polymorphism is fundamental. It en- 
ables generic programming; it supports parametric reasoning about 
programs. Logically, it corresponds to universal quantification. 

Intersection polymorphism (the intersection type A A B) is 
less well appreciated. It enables ad hoc polymorphism; it supports 
irregular generic programming. Logically, it roughly corresponds 
to conjunctiorQ. Not surprisingly, then, intersection is remarkably 
versatile. 



In our setting, this correspondence is strong, as we will see in Seal] 



For both legitimate and historical reasons, intersection types 
have not been used as widely as parametric polymorphism. One 
of the legitimate reasons for the slow adoption of intersection 
types is that no major language has them. A restricted form of 
intersection, refinement intersection, was realized i n two extension s 
of SML, SML-CIDRE <Daviesl 2005ft and Stardust ( Dunfiel djlOOl) . 
These type systems can express properties such as bitwise parity: 
after refining a type bits of bitstrings with subtypes even (an even 
number of ones) and odd (an odd number of ones), a bitstring 
concatenation function can be checked against the type 



(even * even 
A (even * odd - 



-* even) A (odd * odd — > even) 
■ odd) A (odd * even — > odd) 
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which satisfies the refinement restriction: all the intersected types 
refine a single simple type, bits * bits — > bits. 

But these systems were only typecheckers. To compile a pro- 
gram required an ordinary Standard ML compiler. SML-CIDRE 
was explicitly limited to checking refinements of SML types, with- 
out affecting the expressiveness of terms. In contrast, Stardust 
could typecheck some kinds of programs that used general intersec- 
tion and union types, but ineffectively: since ordinary SML com- 
pilers don't know about intersection types, such programs could 
never be run. 

Refinement intersections and unions increase the expressive- 
ness of otherwise more-or-less-conventional type systems, allow- 
ing more precise properties of programs to be verified through 
typechecking. The point is to make fewer programs pass the type- 
checker; for example, a concatenation function that didn't have the 
parity property expressed by its type would be rejected. In con- 
trast, unrestricted intersections and unions, in cooperation with a 
term-level "merge" construct, increase the expressiveness of the 
term language. For example, given primitive operations Int.+ : 
int * int — > int and Real.+ : real * real — > real, we can easily 
define an overloaded addition operation by writing a merge: 

val + = Int . + „ Real . + 

In our type system, this function + can be checked against the type 
(int * int — > int) A (real * real — > real). 

In this paper, we consider unrestricted intersection and union 
types. Central to the approach is a method for elaborating programs 
with intersection and union types: elaborate intersections into prod- 
ucts, and unions into sums. The resulting programs have no inter- 
sections and no unions, and can be compiled using conventional 
means — any SML compiler will do. The above definition of + is 
elaborated to a pair (Int . + , Real . +) ; uses of + on ints become 
first projections of +, while uses on reals become second projec- 
tions of +. 

We present a three-phase design, based on this method, that 
supports one of our ultimate goals: to develop simpler compilers 
for full-featured type systems by encoding many features using 
intersections and unions. 
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Source language 

->, A, V 
Program e : A c 



elaboration 



Target language 



-> M : T 



nondeterministic 
evaluation 
(cbv + merge) 



Result v : A c - 



elaboration 



standard 

evaluation 

(cbv) 



-> W: T 



Figure 1. Elaboration and computation 



1 . An encoding phase that straightforwardly rewrites the program, 
for example, turning a multi-field record type into an intersec- 
tion of single-field record types, and multi-field records into a 
"merge" of single-field records. 

2. An elaboration phase that transforms intersections and unions 
into products and (disjoint) sums, and intersection and union 
introductions and eliminations (implicit in the source program) 
into their appropriate operations: tupling, projection, injection, 
and case analysis. 

3. A compilation phase: a conventional compiler with no support 
for intersections, unions, or the features encoded by phase 1. 

Contributions: Phase 2 is the main contribution of this paper. 
Specifically, we will: 

• develop elaboration typing rules which, given a source expres- 
sion e with unrestricted intersections and unions, and a "merg- 
ing" construct ei„e2, typecheck and transform the program 
into an ordinary A-calculus term M (with sums and products); 

• give a nondeterministic operational semantics (~-»*) for source 
programs containing merges, in which not all reductions pre- 
serve types; 

• prove a consistency (simulation) result: ordinary call-by-value 
evaluation (>—>*) of the elaborated program produces a value 
corresponding to a value resulting from (type-preserving) re- 
ductions of the source program — that is, the diagram in Figure 
[TJcommutes; 

• describe an elaborating typechecker that, by implementing the 
elaboration typing rules, takes programs written in an ML-like 
language, with unrestricted intersection and union types, and 
generates Standard ML programs that can be compiled with any 
SML compiler. 

All proofs were checked using the Twelf proof assistant (Pfen- 
ning and Schiirmann 7l999t iTwelfl |2012|) (with the termination 
checker silenced for a few inductive cases, where the induction 
measure was nontrivial) and are available on the web dDunfieldl 
120121) . For convenience, the names of Twelf source files <\ . &lf\ 
are hyperlinks. 

While the idea of compiling intersections to products is not new, 
this paper is its first full development and practical expression. 
An essential twist is the source-level merging construct ei,,e2, 
which embodies several computationally distinct terms, which can 
be checked agains t various parts o f an intersection type, reminis- 
cent of Forsvthe dRev no ds 1996) and (more distantly) the A&- 



calculus (Casta gna et al, 1 19951) . Intersections can still be intro- 
duced without this construct; it is required only when no single term 
can describe the multiple behaviours expressed by the intersection. 
Remarkably, this merging construct also supports union elimina- 



tions with two computationally distinc t branches (un like markers 
for union elimination in work such as iPiercel d!991ah ). As usual, 
we have no source-level intersection eliminations and no source- 
level union introductions; elaboration puts all needed projections 
and injections into the target program. 

Contents: In Section[2] we give some brief background on inter- 
section types, discuss their introduction and elimination rules, in- 
troduce and discuss the merge construct, and compare intersection 
types to product types. Section[5]gives background on union types, 
discusses their introduction and elimination rules, and shows how 
the merge construct is also useful for them. 

Section|4]has the details of the source language and its (unusual) 
operational semantics, and describes a non-elaborating type system 
including subsumption. Section[5]presents the target language and 
its (entirely standard) typing and operational semantics. Section[6] 
gives the elaboration typing rules, and proves several key results 
relating source typing, elaboration typing, the source operational 
semantics, and the target operational semantics. 

Section [7] discusses a major caveat: the approach, at least in 
its present form, lacks the theoretically and practically important 
property of coherence, because the meaning of a target program 
depends on the choice of elaboration typing derivation. 

Section [8] shows encodings of type system features into inter- 
sections and unions, with examples that are successfully elaborated 
by our prototype implementation (Section|9). Related work is dis- 
cussed in Section[l0] and SectionfTTIconcludes. 

2. Intersection Types 

What is an intersection type? The simplistic answer is that, suppos- 
ing that types describe sets of values, A A B describes the inter- 
section of the sets of values of A and B. That is, v : A A B if v : A 
and v : B. 

Less simplistically, the name has been used for substantially 
different type constructors, though all have a conjunctive flavour. 
The intersection type in this paper is commutative (A A B = 
BAA) and idempotent (A A A = A), following several semina l 
papers on intersection types (Pottinger 1980; Copp o et all 1 198 lb 
and more recent work with refinement intersections (Freeman and 
Pfenn ing TT99ll :lD avies and Pfenning 2000; Dunfield and Pfenning 
2003). Other lines of rese arch have worked with no nlinear and/or 
ordered intersections, e.g. iKfourv and Wells (2004), which seem 
less directly applicable to practical tvpe svstems (M0ller Neergaard 
and Mairson 2004])! 

For this paper, then: What is a commutative and idempotent 
intersection type? 

One approach to this question is through the Curry-Howard 
correspondence. Naively, intersection should correspond to logical 
conjunction — but products correspond to logical conjunction, and 
intersections are not products, as is evident from comparing the 
standard introduction and elimination rules for intersection to the 
(utterly standard) rules for product. (Throughout this paper, k is 
existentially quantified over {1 , 2}; technically, and in the Twelf 
formulation, we have two rules AEi and AE2, etc.) 



e : Ai 



e : A 2 



e : At A A 2 



AI 



e, : A, 



e 2 : A 2 



(d , e 2 ) : Ai * A 2 



*I 



e : A, A A 2 
e : A k 

e : Ai * A 2 



projk e : A k 



AE k 



*E k 



- For impure call-by- value languages like ML, AI ordinarily needs to be 
restricted to type a value v, fo r reasons analogous to the va lue restriction 
on parametric polymorphism (Davies and Pfenning 2000). Our setting, 
however, is not ordinary: the technique of elaboration makes the more 
permissive rule safe, though user-unfriendly. See Section 16.51 
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Here AI types a single term e which inhabits type Ai and type 
A2: via Curry-Howard, this means that a single proof term serves 
as witness to two propositions (the interpretations of A] and A2). 
On the other hand, in *I two separate terms ei and e2 witness 
the propositi ons corresp onding to Ai and A2. This difference was 
suggested by Pottingerj l l 19801) . and made concrete when Hindley 
( 1984) showed that intersection (of the form described by Coppo 
et al. ( 1981) and Pottinger ( 1980)) cannot correspond to conjunc- 
tion because the following type, the intersection of the types of the 
I and S combinators, is uninhabited: 

(A -» A) A ((A^B^C) -> (A->B) -> A -» C) 



yet the prospectively corresponding proposition is provable in intu- 
itionistic logic: 

(AD A) and ((AdBdC) D (AdB) D Ad C) (*) 

Hindley notes that every term of type A — > A is 6 -equivalent 
to e^ = Ax. x, and every term of type D is |3-equivalent to 
e2 = Ax. Ay. Az. xz (t| z), the S combinator. Any term e of type 
(A — > A) A D must therefore have two normal forms, ei and e2, 
which is impossible. 

But that impossibility holds for the usual A-terms. Suppose we 
add a merge construct ei „ e2 that, quite brazenly, can step to two 
different things: ei ,, e2 >— > ei and ei,,e2 >— > Zz- Its typing rule 
chooses one subterm and ignores the other (throughout this paper, 
the subscript k ranges over {1 , 2}): 

e k : A 
ei„ e 2 : A 

In combination with AI, the merge k rule allows two distinct imple- 
mentations ei and e2, one for each of the components Ai and A2 
of the intersection: 



ei : Ai 
ei„ e 2 : Ai 



merge 1 



e 2 : A 2 
ei „ e 2 : A 2 



merge2 



AI 



e,„e 2 : Ai A A 2 

Now (A — > A) A D is inhabited: 

ei„e 2 : (A -> A) A D 

With this construct, the "naive" hope that intersection corresponds 
to conjunction is realized through elaboration: we can elaborate 
ei ,, e2 to (ei , ei),a. term of type (A — > A) * D, which does cor- 
respond to the proposition (*). Inhabitation and provability again 
correspond — because we have replaced the seemingly mysterious 
intersections with simple products. 

For source expressions, intersection still has several properties 
that set it apart from product. Unlike product, it has no elimination 
form. It also lacks an explicit introduction form; AI is the only intro 
rule for A. While the primary purpose of merge k is to derive the 
premises of AI, the merge k rule makes no mention of intersection 
(or any other type constructor). 

Pottinger jl980h presents intersection A & B as a proposition 
with some evidence of A that is also evidence of B — unlike A & B, 
corresponding to A * B, which has two separate pieces of evidence 
for A and for B. In our system, though, ei „ e2 is a single term 
that provides evidence for A and B, so it is technically consistent 
with this view of intersection, but not necessarily consistent in spirit 
(since ei and e2 can be very different from each other). 

3. Union Types 

Having discussed intersection types, we can describe union types 
as intersections' dual: if v : Ai V A2 then either v : Ai or v : A2 
(perhaps both). This duality shows itself in several ways. 



For union V, introduction is straightforward, as elimination was 
straightforward for A (again, k is either 1 or 2): 



r h e : A k 
T h e : A, V A 2 



Vlfc 



Coming up with a good elimination rule is trickier. A number of 
appealing rules are unsound; a sound, yet acceptably strong, rule is 



T h e : Ai V A 2 



T,x, : A, h 
r,x 2 : A 2 h 



S[x 2 ] 



T h S[e ] : C 



VE 



This rule types an expression £ [eol — an evaluation context £ 
with eo in an evaluation position — where eo has the union type 
Ai V A2. During evaluation, eo will be some value Vo such 
that either vo : Ai or vo : A2. In the former case, the premise 
xi : Ai h £ : C tells us that substituting Vo for xi gives a well- 
typed expression £[vq]. Similarly, the premise X2 : A2 h £[%z\ '■ C 
tells us we can safely substitute Vo for X2 . 

The restriction to a single occurrence of eo in an evaluation po- 
sition is needed for soundness in many settings — generally, in any 
operational semantics in which eo might step to different expres- 
sions. One simple example is a function f : (A — > A — > C) A 
(B — > B — > C) and expression eo : A V B, where eo changes 
the contents pointed to by a reference of type (A V B) ref, before 
returning the new value. The application f eo eo would be well- 
typed by a rule allowing multiple occurrences of eo, but unsound: 
the first eo could evaluate to an A and the second eo to a B. 

The evaluation context £ need not be uni que, which cr eates 
some difficulties for practical tv pechecking (|Dun field 201 1]). For 
further discussion of this rule, see Dun field and Pfenning! d2003l) . 

We saw in Section [2] that, in the usual A-calculus, A does not 
correspond to conjunction; in particular, no A-term behaves like 
both the I and S combinators, so the intersection (A— >A) A D 
(where D is the type of S) is uninhabited. In our setting, though, 
(A— >A) A D is inhabited, by the merge of I and S. 

Something similar comes up when eliminating unions. With- 
out the merge construct, certain instances of union types can't 
be usefully eliminated. Consider a list whose elements have type 
int V string. Introducing those unions to create the list is easy 
enough: use V1 1 for the ints and VI2 for the strings. Now sup- 
pose we want to print a list element x : int V string, convert- 
ing the ints to their string representation and leaving the strings 
alone. To do this, we need a merge; for example, given a function 
g : (int — > string) A (string — > string) whose body contains a 
merge, use rule VE on g x with £ = g [] and eo = x. 

Like intersections, unions can be tamed by elaboration. Instead 
of products, we elaborate unions to products' dual, sums (tagged 
unions). Uses of V1 1 and VI2 become left and right injections into 
a sum type; uses of VE become ordinary case expressions. 

4. Source Language 
4.1 Source Syntax 



Source types A, B,C 

Typing contexts F 

Source expressions e 

Source values v 

Evaluation contexts £ 



— T|A— >B|AAB|AVB 
:= • I T, x : A 

— x I () I Ax. e I ei e2 I fix x. e 
I ei„e 2 

— x I () I Ax. e I vi „V2 

:= [] I £ e I v £ \ £„e | e„£ 



Figure 2. Syntax of source types, contexts and expressions 
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The source language expressions e are standard, except for 
the feature central to our approach, the merge ei ,, e 2 . The types 
A, B, C are a "top" type T (which will be elaborated to unit), the 
usual function space A — ) B, intersection A A B and union A V B. 
Values v are standard, but a merge of values vi v 2 is considered 
a value, even though it can step! But the step it takes is pure, in 
the sense that even if we incorporated (say) mutable references, it 
would not interact with them. 

4.2 Source Operational Semantics 



e ~^ e 



Source expression e 
steps to e ' 



step EE' in step, elf 



ei e 2 ~» e, e 2 



— step/appl 



e 2 *•* e 2 



e, „ e 2 ~* e, 

ei ~» e{ 
e, „ e 2 ~» ej,, e 2 



(Ax. e)v ~~> [v/x]e 



fix x. e [(fix x. e) /x] e 



step/unmerge left 



Vi e 2 Vi e 2 
step/beta 



— step/app2 



step/fix 



step/merge 1 



ei „ e 2 ~* e 2 
e 2 ^ e 2 



step/unmerge right 



e ~-» e„ e 



ei „ e 2 -w ei „ e 2 

step/split 



— step/merge2 



Figure 3. Source language operational semantics: 
call-by- value + merge construct 

The source language operational semantics (Figure [3} is stan- 
dard (call-by-value function application and a fixed point expres- 
sion) except for the merge construct. This peculiar animal is a 
descendant of "demonic choice": by the 'step/unmerge left' and 
'step/unmerge right' rules, ei ,, e 2 can step to either ei or e 2 . 
Adding to its misbehaviours, it permits stepping within itself 
(' step/merge 1' and 'step/merge2' — note that in 'step/merge2', we 
don't require ei to be a value). Worst of all, it can appear by spon- 
taneous fission: 'step/split' turns any expression e into a merge of 
two copies of e. 

The merge construct makes our source language operational se- 
mantics interesting. It also makes it unrealistic: -^-reduction does 
not preserve types. For type preservation to hold, the operational 
semantics would need access to the typing derivation. Worse, since 
the typing rule for merges ignores the unused part of the merge, 
reduction can produce expressions that have no type at all, or are 
not even closed! The point of the source operational semantics is 
not to directly model computation; rather, it is a basis for checking 
that the elaborated program (whose operational semantics is per- 
fectly standard) makes sense. We will show in Section|6]that, if the 
result M of elaborating e can step to some M', then we can step 
e e' where e' elaborates to M'. 

4.3 (Source) Subtyping 

Suppose we want to pass a function f : A — > C to a function 
g : ((A A B) — > C) — > D. This should be possible, since f 
requires only that its argument have type A; in all calls from g 
the argument to f will also have type B, but f won't mind. With 
only the rules discussed so far, however, the application g f is not 
well-typed: we can't get inside the arrow (A A B) — > C. For 



flexibility, we'll incorporate a subtyping system that can conclude, 
for example, A-) C < (A AB) -) C. 

T he logic of the subtyping rules (Figured top) is taken straight 
from|Dunfield and Pfenning | |2003|) . so we only briefly give some 
intuition. Roughly, A < B is sound if every value of type A can be 
treated as having type B. Under a subset interpretation, this would 
mean that A < B is justified if the set of A-values is a subset of 
the set of B-values. For example, the rule AR<, if interpreted set- 
theoretically, says that if A C Bi and A C B 2 then A C (B, nB 2 ). 

It is easy to show that subtyping is reflexive and transitive; see 
\sub-reflT. elf and sub-trans . el f (Building transitivity into 
the structure of the rules makes it easy to derive an algorithm; an 
explicit transitivity rule would have premises A < B and B < C, 
which involve an intermediate type B that does not appear in the 
conclusion A < C.) 

Having said all that, the subsequent theoretical development 
is easier without subtyping. So we will show (Theorem [T) that, 
given a typing derivation that uses subtyping (through the usual 
subsumption rule), we can always construct a source expression 
of the same type that never applies the subsumption rule. This 
new expression will be the same as the original one, with a few 
additional coercions. For the example above, we essentially r\- 
expand g f to g (Ax. f x), which lets us apply AEi to x : A A B. 
Operationally, all the coercions are identities; they serve only to 
"articulate" the type structure, making subsumption unnecessary. 

Note that the coercion in rule VL< is eta-expanded to allow 
VE to eliminate the union in the type of x; as discussed later, the 
subexpression of union type must be in evaluation position. 

4.4 Source Typing 

The source typing rules (Figure |4) are either standard or have 
already been discussed in Sections|2]and[3] except for direct. 

The direct rule was introduced and justified in Dunfield and 
Pfenning (2003, 2004). It is a 1-ary version of VE, a sort of cut: 
a use of the typing eo : A within the derivation of S[eol : C 
is replaced by a derivations of eo : A, along with a derivation 
of £[x] : C that assumes x : A. Curiously, in this system of 
rules, direct is admissible: given eo : A, use VIi or VI 2 to 
conclude eo : A V A, then use two copies of the derivation 
x : A h £ [x] : C in the premises of VE (oc-converting x as needed). 
So why include it? Typing using these rules is undecidable; our 
implementation (Section^ follows a bidirectional version of them 
( where typechecking is decidabl e, given a few annotations, similar 
to Dunfield and Pfenning (2004)), where direct is not admissible. 
(A side benefit is that direct and VE are similar enough that it can 
be helpful to do the direct case of a proof before tackling VE.) 
Remark. TheoremQ] and all subsequent theorems, are proved only 
for expressions that are closed under the appropriate context, even 
though merge k does not explicitly require that the unexamined 
subexpression be closed; Twelf does not support proofs about ob- 
jects with unknown variables. 

Theorem 1 (Coercion). If T> derives V h e : B then there exists 
an e' such that T>' derives V h e' : B, where T>' never uses rule 
sub. 

Proof. By induction on T>. The interesting cases are for sub and 
VE. In the case for sub with A < B, we show that when the 
coercion e cocrcc — which always has the form Ax. eo — is applied 
to an expression of type A, we get an expression of type B. For 
example, for ALi < we use AEi . This shows that e' = (Ax. eo) e 
has type B. 

For VE, the premises typing £[xi<] might "separate", say if the 
first includes subsumption (yielding the same £ [xi ]) and the second 
doesn't. Furthermore, inserting coercions could break evaluation 
positions: given £ = f [], replacing f with an application (e COC i CC f) 
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A < B ::: e 



Source type A is a subtype of source type B, 
with coercion e of type ■ h e : A — > B 



sub A B Coe CoeTyping in typeof+sub . elf 



B, < A, ::: e A 2 < B 2 ::: e' 
Ai -> A 2 < Bi -> B 2 ::: Af. Ax. e'(f (ex)) 



A k < B ::: e A < B, ::: e, 

AL k < 



A < T ::: Ax. () 
A < B 2 ::: e 2 



TR< 



A, A A 2 < B ::: e 



A < B, AB 2 ::: e, „ e 2 



AR< 



Ai < B ::: e, A 2 < B ::: e 2 
Ai V A 2 < B ::: Ax. ( Ay . e-i y „ e 2 y ) x 



VL< 



A < B k ::: e 
A < B, V B 2 ::: e 



VR k < 



F h e : A 



Source expression e has source type A typeof+sub E A in typeof+sub. elf 



H , x : A, T 2 h x : A 



r h e k : A 
T h ei„e 2 : A 



merge k 



r,x:Ahe:B The, :A^ 



r, x : A h e : A 
T h fix x. e : A 

T h e 2 : A 



fix 



r h Ax. e : A -> B 

T h e : A, r h e : A 2 



AI 



F h e, e 2 : B 
r h e : A, A A 2 



->E 



AE k 



r h v : T 



TI 



r h e : A, A A 2 T h e : A k 

T,x, : A, h £[xi] : C 

rhe :A r,x: A h £[x] : C T h e : A k r h e : Ai V A 2 r, x 2 : A 2 h £ [x 2 ] : C 

direct : — — VI k 



T h £[e ] : C 



r h e : A, V A 2 

T h e : A A < B ::: e coelc 
r h e:B 



r h eieo] ■■ c 



VE 



sub 



Figure 4. Source type system, with subsumption, non-elaborating 



means that [ ] is no longer in evaluation position. To handle these 
issues, let e' = (Ay. e{„ e 2 ) e' , where e' comes from applying the 
induction hypothesis to the derivation of Y h eo : Ai V A 2 , and 
e( and e 2 come from applying the induction hypothesis to the other 
two premises. Now e' is in evaluation position, because it follows 
a A; the merge k typing rule will choose the correct branch. 

For details, see \coerce. elf\ We actually encode the typings 
for e CO ercc as hypothetical derivations in the subtyping judgment 
itself ( type of+sub . ei/) , making the sub case here trivial. □ 

5. Target Language 

Our target language is just the simply-typed call-by-value A- 
calculus extended with fixed point expressions, products, and sums. 



5.1 Target Syntax 

Target types T 
Typing contexts G 
Target terms M, N 



: unit |T— >T|T*T|T + T 
= • I G,x:T 

x| () | Ax. M| MN | fix x. M 



Target values 



| (Mi , M 2 ) | proj k M 
I inj k M | case M of inj! xi Ni 
I inj 2 x 2 => N 2 

W ::= x | () | Ax. M | (W, , W 2 ) | inj k W 



Figure 5. Target types and terms 
The target types and terms (Figure [5} are completely standard. 



5.2 Target Typing 

The typing rules for the target language (Figure[6} lack any form of 
subtyping, and are completely standard. 

5.3 Target Operational Semantics 

The operational semantics M i— » M' is, likewise, standard; func- 
tions are call-by-value and products are strict. As usual, we write 
M h->* M' for a sequence of zero or more t— >s. 
Naturally, a type safety result holds: 

Theorem 2 (Target Type Safety). // ■ h M : T then either M is a 
value, wMhM' and ■ h M' : T. 

Proof. By induction on the given derivation, using a few standard 
lemmas; see tm-safety . elf (The necessary substitution lemma 
comes for free in Twelf.) □ 

And to calm any doubts about whether M might step to some 
other, not necessarily well-typed term: 

Theorem 3 (Determinism of i— >). 

IfM i — ) Ni and MHN2 then Ni = N 2 (up to oc-conversion). 
Proof. By simultaneous induction. See tm-deterministic in 
tm-safety . el f □ 



6. Elaboration Typing 

We elaborate source expressions e into target terms M. The source 
expressions, which include a "merge" construct ei „ e 2 , are typed 
with intersections and unions, but the result of elaboration is com- 
pletely standard and can be typed with just unit, — >, * and +. 
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G h M:T 



Target term M has target type T 



typeoftm M T in typeoftm.elf 



typeoftm/ G , X : T r M : T typeoftm/ typeoftm/ 

Gi,x : T,G 2 I- x : T ™r G h fix x. M : T <™ G h : unit unitintro 

G,x:T, h M:T 2 typeoftm/ G h Mi : T — > T' G h M 2 : T typeoftm/ 
G h Ax. M : (T, -» T 2 ) arrintro G h M, M 2 : T' a™ 1 ™ 

G h Mi : Ti G h M 2 : T 2 typeoftm/ G h M : (T * T 2 ) typeoftm/ 

G h (M, , M 2 ) : (Ti *T 2 ) prodintro G h (projk M) : T k prodelim k 

G,x, : T, h Ni : T 

G h M : Tic typeoftm/ G h M : Ti + T 2 G,x 2 : T 2 h N 2 : T typeoftm/ 

G h (inj k M) : (T, + T 2 ) sumintro k G h (case M of inj n xi => Ni I inj 2 x 2 =4- N 2 ) : T sumelim 

Figure 6. Target type system with functions, products and sums 



MhM' Target term M steps to M' 

m; 



steptm M M' 
in steptm. elf 



M, 



M 2 i — } M 2 



M, M 2 i — ^ M{ M 2 Wi M 2 i-> Wi M 2 



(Ax. M)W i-> [W/x]M 8xi,Mh [(fix x. M)/x]M 
MhM' 



proj k M' M projk M' proj k (Wi , W 2 ) m W k 

Mi i ^ Mi' M 2 i — ^ M 2 



(M, , M 2 ) m (Mi , M 2 ) (Wi , M 2 ) i— ) (Wi , M 2 ) 
MhM' MhM' 



injk M i — ) injk M' case M of MS i-> case M' of MS 

caseinjk Wof inj n xi =4 N, I inj 2 x 2 N 2 i-> [W/x k ]N k 

Figure 7. Target language operational semantics: 
call-by-value + products + sums 

The elaboration judgment T h e : A <— > M is read "under 
assumptions F, source expression e has type A and elaborates to 
target term M". While not written explicitly in the judgment, the 
elaboration rules ensure that M has type |A|, the type translation of 
A (Figure[8}. For example, |T A (T— >T}| = unit * (unit—>unit). 

To simplify the technical development, the elaboration rules 
work only for source expressions that can be typed without using 
the subsumption rule sub (Figure |4). Such source expressions can 
always be produced (TheoremQ] above). 

The rest of this section discusses the elaboration rules and 
proves related properties: 

16. II connects elaboration, source typing, and target typing; 

16.21 gives lemmas useful for showing that target computations cor- 
respond to source computations; 

16.31 states and proves that correspondence (consistency, Thm.ll3b: 

16.41 summarizes the metatheory through two important corollaries 
of our various theorems. 

Finally, Section [631 discusses whether we need a value restric- 
tion on AI. 



|T| 

|A, -» A 2 | 
|A, AA 2 | 
|A, V A 2 | 



unit 

|Ai| -> |A 2 | 
|A,| * |A 2 | 
|A,| + |A 2 | 



Figure 8. Type translation 



6.1 Connecting Elaboration and Typing 

Equivalence of elaboration and source typing: The non-elaborating 
type assignment system of Figure [4] minus sub, can be read off 
from the elaboration rules in Figure [9] simply drop the <—¥... 
part of the judgment. Consequently, given e : A > M we can 
always derive e : A: 

Theorem 4. 

If F h e : A <— > M then r h e : A (without using rule sub). 

Proof. By straightforward induction on the given derivation; see 
typeof-era.se in typeof-elab . elf □ 



More interestingly, given e : A we can always elaborate e, so 
elaboration is just as expressive as typing: 

Theorem 5 (Completeness of Elaboration). 

IfV h e : A (without using rule sub) then F h c : A H M. 

Proof. By straightforward induction on the given derivation; see 
elab-complete in typeof-elab . el f □ 

Elaboration produces well-typed terms: Any target term M pro- 
duced by the elaboration rules has corresponding target type. In 
the theorem statement, we assume the obvious translation |F|, e.g. 
|x:T,y:TVT| = x:|T|,y:|TVT| = x:unit,y:unit + unit). 

Theorem 6 (Elaboration Type Soundness). 
IfV h e : A <-> M then \V\ h M : |A|. 

Proof. By induction on the given derivation. For example, the case 
for direct, which elaborates to an application, applies typeoft- 
m/arrintro and typeoftm/arrelim. Exploiting a bijection between 
source types and target types, we actually prove V h M : A, 
interpreting A and types in Y as target types: A as *, etc. See 
elab-type-soundness . elf □ 
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r h e:A h M 



Source expression e has source type A 

and elaborates to target term M (of type |A|) 



elab E A M in elab. elf 



f h e k : A m M 



Fj:Ahe:AHM 



r,,x: A,T 2 I- x: A ^ x ""' r h ei„e 2 : A H M ^ T h fix x. e : A ^ fix x. M "" r h v:T H () 

r,x:Ahe:B^M r h ei : A -> B Mi rhc 2 :AHM 2 



fix 



TI 



F h Ax. e : A 



B 



Ax. M 



Th e,e 2 :B h M, M 2 



^E 



f h e:A, h M, Fhe:A 2 HM 2 



AI 



r h 



: Ai A A 2 



M 



r h e:A, AA 2 h (M, , M 2 ) ' rhe:A k M proj k M 

rhe:A k HM 



AEv 



VIv 



F h e :A h Mo T,x : A h £[x] : C ^ N 
r I- S[e ] : C ^ (Ax.N)Mo 



F h e : Ai V A 2 m inj k M 

r,xi : A, h £[x,] : C N, 
T h e : A, V A 2 ^ Mo T, x 2 : A 2 h £ [x 2 ] : C N 2 



direct 



T h £ [e ] : C > case M of inji xi =>• Ni I inj2 x 2 IM 2 



VE 



Figure 9. Elaboration typing rules 



6.2 Relating Source Expressions to Target Terms 

Elaboration produces a term that corresponds closely to the source 
expression: a target term is the same as a source expression, except 
that the intersection- and union-related aspects of the computation 
become explicit in the target. For instance, intersection elimination 
via AE 2 , implicit in the source program, becomes the explicit 
projection proj 2 - The target term has nearly the same structure 
as the source; the elaboration rules only insert operations such as 
proj 2 , duplicate subterms such as the e in AI, and omit unused 
parts of merges. 

This gives rise to a relatively simple connection between source 
expressions and target terms — much simpler than a logical rela- 
tion, which relates all appropriately-typed terms that have the same 
extensional behaviour. In fact, stepping in the target preserves elab- 
oration typing, provided we are allowed to step the source expres- 
sion zero or more times. This consistency result, Theorem[T3] needs 
several lemmas. 

Lemma 7. If e ~A e' then £[e] £[e']. 

Proof. By induction on the number of steps, using a lemma 
(step-eval-context) that e e' implies £[e) ~» £[e']. See 
step*eval-context in step-eval-context . elf □ 

Next, we prove inversion properties of unions, intersections and 
arrows. Roughly, we want to say that if an expression of union 
type elaborates to an injection inj k Mo, it also elaborates to Mo. 
For intersections, the property is slightly more complicated: given 
an expression of intersection type that elaborates to a pair, we 
can step the expression to get something that elaborates to the 
components of the pair. Similarly, given an expression of arrow 
type that elaborates to a A-abstraction, we can step the expression 
to a A-abstraction. 

Lemma 8 (Unions/Injections). 

IfY h e : Ai V A 2 <— » inj k M then V \- e : A k <— » M . 

Proof. By induction on the derivation of V h e : C ■— » M. 
The only possible cases are merge k and VI k . See elab-inl and 
el ab — inr in elab -union, el f □ 

Lemma 9 (Intersections/Pairs). 

Iff h e : Ai A A 2 ■— > (Mi , M 2 ; 

then there exist e\ and e 2 such that 



(1) e e\ andY h ^\ : Ai Mi, and 

(2) e ~>* e 2 and F h ej : A 2 H M 2 . 

Proof. By induction on the given derivation; the only possible cases 
are AI and merge See elab-se ct . elf\ □ 

Lemma 10 (Arrows/Lambdas). 

If - h e : A — > B Ax. M then there exists eo 

such that e ~+* Ax. eo and x : A h eo : B =— > Mo. 

Proof. By induction on the given derivation; the only possible cases 
are — >I and merge See elab-a rr. elf\ □ 

Our last interesting lemma shows that if an expression e elabo- 
rates to a target value W, we can step e to some value v that also 
elaborates to W. 

Lemma 11 (Value monotonicity). If V h e : A °-> W then 
e v where F h v:A H W. 

Proof. By induction on the given derivation. 

The most interesting case is for AI, where we apply the induc- 
tion hypothesis to each premise (yielding v{ , v 2 such that e ~>* v{ 
and e ■w* v 2 ), apply the 'step/split' rule to turn e into (e„ e), and 
use the 'step/merger and 'step/merge2' rules to step each part of 
the merge, yielding v{ „ v 2 , which is a value. 

In the merge k case on a merge ei„e 2 , we apply the induc- 
tion hypothesis to e k , giving e k ~A v. By rule 'step/unmerge', 
ei „ e 2 ^+ e k , from which ei „ e 2 ■w* v. 

See value-mono . elf □ 

Lemma 12 (Substitution). IfV, x : A h e:B <-» M and 

r h v : A <-> W then F h [v/x]e : B <-> [W/x]M. 

Proof. By induction on the first derivation. As usual, Twelf gives 
us this substitution lemma for free. □ 

6.3 Consistency 

This theorem is the linchpin: given e that elaborates to M, we 
can preserve the elaboration relationship even after stepping M, 
though we may have to step e some number of times as well. The 
expression e and term M, in general, step at different speeds: 

• M steps while e doesn't — for example, if M is inj i ( Wi , W 2 ) 
and steps to Wi , there is nothing to do in e because the injection 
corresponds to implicit union introduction in rule VI i ; 
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• e may step more than M — for example, if e is (vi „ V2) V and 
M is (Ax. x) W, then M |3-reduces to W, but e must first 
'step/unmerge' to the appropriate Vjc, yielding v^v, and then 
apply 'step/beta'. 

(Note that the converse — if e e' then M 1— >* M' — does not 
hold: we could pick the wrong half of a merge and get a source 
expression with no particular relation to M.) 

Theorem 13 (Consistency). 

If ■ h e : A <-* M and MhM' 

then there exists e' such that e -w* e' a/irf ■ h e' : A M'. 

Proof. By induction on the derivation 2? of ■ h e : A c — » M. We 
show several cases here; the full proof is in consistency . elf 

• Case var, TI, — »I: Impossible because M cannot step. 

• Case AI: 



• h e : Ai Mi 



■ h e:A 2 h M 2 



■ h 



A, A A 2 



(Mi , M 2 ) 



By inversion, either Mi >— > M,' or M2 1— > M 2 . Suppose the 
former (the latter is similar). By i.h., e ~+* ej and - he,': 
Ai <— > Mj. By 'step/split', e — > e„ e. Repeatedly applying 
'step/mergel' gives e„ e ~-»* e,'„ e. 

For typing, apply merge, with premise ■ h e{ : A| m M,' 
and with premise ■ h e : A 2 > M 2 . 

Finally, by AI, we have • h e,'„e:Ai AA2 <^-» (M,' , M 2 ) . 
• CaseAE k : 



h e:Ai AA 2 M Mo 



V : 



^e:A k H proj k M 

If projk M 1 — ) projk M with M 1— > M , use the i.h. and 
apply AE k . 

If M = (Wi , W 2 ) and projk M h-> W k , use Lemma[9] 
yielding e ~A e k and F h e k : A k ■— » W k . 

• Case merge k : 



h c k :A H M 



25 : 



h ei „ e 2 : A M 

By i.h., e k ~-»* e' and ■ h e' : A. By rule 'step/unmerge' 
ei ,, e 2 ~» e k . Therefore ei „ e 2 ~»* e'. 

• Case— >E: 



h ei : A^B <^-> Mi 



h e 2 : A > M 2 



23:: 



■ h ei e 2 : B h M, M2 

We show one of the harder subcases (consistency/app/beta 
in consistency . elf). In this subcase, Mi = Ax. Mo and M 2 
is a value, with Mi M 2 >— > [M 2 /x]Mo. We use several easy 
lemmas about stepping; for example, step*appl says that if 
ei ~-»* e,' then ei e 2 e,' e 2 . 

Elabl :: • h ei : A -> B Ax. M Subd. 
ElabBody :: x : A h e : B <^> M By LemmafTOl 
StepsFun :: ei ~+* Ax. eo 

StepsApp :: ei e 2 (Ax. eo)e 2 By step*appl 

Elab2 :: • h e 2 : A > M 2 Subd. 

M 2 value Above 
Elab2' :: ■ h e 2 ~-»* v 2 By Lemma Qj] 

■hv 2 :AHM 2 " 
(Ax. eo)e 2 (Ax. eo)v 2 By step*app2 

ei e 2 (Ax. eo)v 2 By step*append 
(Ax. eo)v 2 [v 2 /x]eo By 'step/beta' 

StepsAppBeta :: ei e 2 [v 2 /x]eo By step*snoc 

ElabBody :: x : A h e : B <—. > M Above 

• I" [v 2 /x]e : B [M 2 /x]M By Lemma[l2](Elab2') □ 



Theorem 14 (Multi-step Consistency). 

If - h e : A =— » M a«rf M H* W f/ierc ?/iere exw?.s v such that 
e ~A v and ■ h v : A VV. 

Proof. By induction on the derivation of M 1— >* W. 

If M is some value w then, by Lemma Qj] e is some value v. 
The source expression e steps to itself in zero steps, so v ~A v, and 
• h v : A M W is given (e = v and M = W). 

Otherwise, we have M h-> M' where M' W. We want to 
show • h e' : A <— > M', where e ~A e'. By Theorem [T3l either 
■ h e:A m M', or e v* e' and • h e' : A «-> M'. 

• If • h e : A M', let e' = e, so • h e' : A M' and 
e ~A e' in zero steps. 

• If e ~+ e' and ■ h e' : A °-> M', we can use the i.h., showing 
that e' ~A v and Av:AmW. 

See consistency* in consistency . elf □ 

6.4 Summing Up 

Theorem 15 (Static Semantics). 

If - h e : A (using any of the rules in Figure^) then there exists e' 
such that Ae'iAnM and ■ h M : |A|. 

Proof. By Theorems[T](coercion),|5](completeness of elaboration) 
and|6](elaboration type soundness). □ 

Theorem 16 (Dynamic Semantics). 

Tjf • h e : A ) M anrf Mh* W f/jen ?/;ere is a source value v 
such that e v anrf • h v : A. 



Proof. By Theorems[T4](multi-step consistency) and|4] 



□ 



Recalling the diagram in Figure [T] Theorem [16] shows that it 
commutes. 

Both theorems are stated and proved in summary, elf Com- 
bined with a run of the target program (M 1— >* W), they show that 
elaborated programs are consistent with source programs. 

6.5 The Value Restriction 



Davies and Pfenning |2000) showed that the then-standard intersec- 
tion introduction (that is, our AI) was unsound in a call-by- value se- 
mantics in the presence of effects (specifically, mutable references). 
Here is an example (modeled on theirs). Assume a base type nat 
with values 0,1,2,... and a type pos of strictly positive naturals 
with values 1,2,...; assume pos < nat. 

let r = (ref 1 ) : (nat ref) A (pos ref) in 
r :=0; 
( !r) : pos 

Using the unrestricted AI rule, r has type (nat ref) A (pos ref); 
using AEi yields r : nat ref, so the write r := is well-typed; 
using AE 2 yields r : pos ref, so the read ! r produces a pos. In an 
unelaborated setting, this typing is unsound: (ref 1 ) creates a single 
cell, initially containing 1, then overwritten with 0, so !r ~* 0, 
which does not have type pos. 

Davies and Pfenning proposed, analogously to MLs value re- 
striction on V-introduction, an A-introduction rule that only types 
values v. This rule is sound with mutable references: 



v : Ai 



v : A 2 



v : Ai A A 2 



AI (Davies and Pfenning) 



In an elaboration system like ours, however, the problematic 
example above is sound, because our AI elaborates ref 1 to two 
distinct expressions, which create two unaliased cells: 



s 



ref 1 : nat ref <— » ref 1 ref 1 : pos ref ref 1 

AI 

ref 1 : nat ref A pos ref <—i (ref 1 , ref 1 ) 

Thus, the example elaborates to 

let t = (ref 1 , ref 1 ) in 
(proji r} := 0; 
( ! proj 2 r) : pos 

which is well-typed, but does not "go wrong" in the type-safety 
sense: the assignment writes to the first cell (AEi ), and the deref- 
erence reads the second cell (AE2), which still contains the origi- 
nal value 1 . The restriction-free AI thus appears sound in our set- 
ting. Being sound is not the same as being useful, though; such 
behaviour is less than intuitive, as we discuss in the next section. 

7. Coherence 

The merge construct, while simple and powerful, has serious us- 
ability issues when the parts of the merge have overlapping types. 
Or, more accurately, when they would have overlapping types — 
types with nonempty intersection — in a merge-free system: in our 
system, all intersections A A B of nonempty A, B are nonempty: 
if Va : A and vb I B then va„ vb : A A B by merge k and AI. 

According to the elaboration rules, 0,, 1 (checked against nat) 
could elaborate to either or 1 . Our implementation would elab- 
orate 0„ 1 to 0, because it tries the left part first. Arguably, this 
is better behaviour than actual randomness, but hardly helpful to 
the programmer. Perhaps even more confusingly, suppose we are 
checking 0,, 1 against pos A nat, where pos and nat are as in Sec- 
tion [63] Our implementation would elaborate 0„ 1 to (1 , 0), but 
1„0to (1, 1). 

Since the behaviour of the target program depends on the partic- 
ular e laboration typing used, the system lacks coherence iRevnolds 
1 19911) . 

To recover a coherent semantics, we could limit merges accord- 
ing to their surface syntax, as Reynolds did in Forsythe, but this 
seems restrictive; also, crafting an appropriate syntactic restriction 
depends on details of the type system, which is not robust as the 
type system is extended. A more general approach might be to re- 
ject (or warn about) merges in which more than one part checks 
against the same type (or the same part of an intersection type). Im- 
plementing this seems straightforward, though it would slow type- 
checking since we could not skip over ex when ei checks in ei „ ex. 

Leaving merges aside, the mere fact that AI elaborates the 
expression twice creates problems with mutable references, as we 
saw in Section [o31 For this, we could revive the value restriction in 
AI, at least for expressions whose types might overlap. 

8. Applying Intersections and Unions 
8.1 Overloading 

A very simple use of unrestricted intersections is to "overload" op- 
erations such as multiplication and conversion of data to printable 
form. SML provides overloading only for a fixed set of built-in op- 
erations; it is not possible to write a single square function, as 
we do in Figure [10] Despite its appearance, (*[ val square : 
... ] *) is not a comment but an annotation used to guide our 
bidirectional typechecker (this syntax, inherited from Stardust, was 
intended for compatibility with SML compilers, which saw these 
annotations as comments and ignored them). 

In it s present form, this id iom is less powerful than type 
classes dWadler a nd Blott 1989). We could extend toString for 
lists, which would handle lists of integers and lists of reals, but not 



val mul = Int . * 

val toString = Int. toString 

val mul = mul ,, Real.* (* shadows earlier 'mul '* ) 

val toString = toString ,, Real . toString 

(*[ val square : (int — > int) A (real — > real) ]*) 
val square = fn x x * x 

val _ = print (toString (mul (0.5, 300.0)) " "; ") 
val _ = print (toString (square 9) " "; ") 
val _ = print (toString (square 0.5) " "\n") 

Output of target program after elaboration: 150.0; 81; 0.25 
Figure 10. Example of overloading 

lists of lists; the version of toString for lists would use the ear- 
lier occurrence of toString, defined for integers and reals only. 
Adding a mechanism for naming a type and then "unioning" it, 
recursively, is future work. 

8.2 Records 

Reynolds {1996) developed an encoding of records using intersec- 
tion ty pes and his vers i on of the merge construct; similar ideas ap- 
pear in Ca stagna et al.l (1 1995h . Though straightforward, this encod- 
ing is more expressive than SML records. 

The idea is to add single-field records as a primitive notion, 
through a type -[fid : A} with introduction form {f ld= e} and the 
usual eliminations (explicit projection and pattern matching). Once 
this is done, the multi-field record type {f ldl : A] , f ld2 : A2} 
is simply {fldl : Ai> A {fld2 : A2}, and the corresponding 
intro form is a merge: {f ldl= A] }„ {f ld2= A2}. More standard 
concrete syntax, such as {fldl= Ai , f ld2= A2K can be handled 
trivially during parsing. 

With subtyping on intersections, we get the desired behaviour 
of what SML calls "flex records" — records with some fields not 
listed — with fewer of SML's limitations. Using this encoding, a 
function that expects a record with fields x and y can be given any 
record that has at least those fields, whereas SML only allows one 
fixed set of fields. For example, the code in Figure[TT]is legal in our 
language but not in SML. 

One problem with this approach is that expressions with du- 
plicated field names are accepted. This is part of the larger issue 
discussed in Section[7] 

8.3 Heterogeneous Data 

A common argument for dynamic typing over static typing is that 
heterogeneous data structures are more convenient. For example, 
dynamic typing makes it very easy to create and manipulate lists 
containing both integers and strings. The penalty is the loss of 
compile-time invariant checking. Perhaps the lists should contain 
integers and strings, but not booleans; such an invariant is not 
expressible in traditional dynamic typing. 

A common rebuttal from advocates of static typing is that it 
is easy to simulate dynamic typing in static typing. Want a list of 
integers and strings? Just declare a datatype 

datatype int_or_string = Int of int 

I String of string 

and use int_or_string lists. This guarantees the invariant that 
the list has only integers and strings, but is unwieldy: each new 
element must be wrapped in a constructor, and operations on the list 
elements must unwrap the constructor, even when those operations 
accept both integers and strings (such as a function of type (int — > 
string) A (string — > string)). 
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(*[ val get_xy : {x:int, y:int} —¥ int*int ]*) 
fun get_xy r = 
(#x(r), #y(r)) 

(*[ val tupleToString : int * int — > string ]*) 
fun tupleToString (x, y) = 
"(" " Int.toString x " "," " Int.toString y " ")" 

val reel = {y = 11, x = 1} 

val rec2 = {x = 2, y = 22, extra = 100} 

val rec3 = {x = 3, y = 33, other = "a string"} 

val _ = print ("get_xy reel = " 

tupleToString (get_xy reel) ~ "\n") 
val _ = print ("get_xy rec2 = " 

tupleToString (get_xy rec2) 
" " (extra = " 

" Int.toString #extra(rec2) " ")\n") 
val _ = print ("get_xy rec3 = " 

tupleToString (get_xy rec3) 
" " (other = " " #other(rec3) " ")\n") 

Output of target program after elaboration: 
get_xy reel = (1,11) 
get_xy rec2 = (2,22) (extra = 100) 
get_xy rec3 = (3,33) (other = a string) 

Figure 11. Example of flexible multi-field records 



datatype 'a list = nil I :: of 'a * 'a list 

type dyn = int V real V string 

(*[ val toString : dyn — > string ]*) 
fun toString x = 
(Int . toString , , 
(fn s s : string) , , 
Real .toString) x 

(*[ val hetListToString : dyn list — > string ]*) 
fun hetListToString xs = case xs of 
nil => "nil" 
I h: :t => (toString h) " ": : " 
(hetListToString t) 

val _ = print "\n\n" 

val _ = print (hetListToString 

[1, 2, "what", 3.14159, 4, "why"]) 
val _ = print "\n\n\n" 

Output of target program after elaboration: 

1 : : 2 : : what : : 3 . 14159 : : 4 : : why : : nil 

Figure 12. Example of heterogeneous data 



In this situation, our approach provides the compile-time invari- 
ant checking of static typing and the transparency of dynamic typ- 
ing. The type of list elements (if we bother to declare it) is just a 
union type: 

type int_union_string = int V string 

Elaboration transforms programs with int_union_string into 
programs with int_or_string. 

Along these lines, we use in Figure [T2] a type dyn, defined as 
int V real V string. It would be useful to also allow lists, but 
the current implementation lacks recursive types of a form that 
could express "dyn = ... V dyn list". 



9. Implementation 

Our implementation is faithful to the spirit of the elaboration rules 
above, but is substantially richer. It is based on Stardust, a type- 
checker for a subset of core Standard ML with support for inductive 
datatypes, p roducts, intersec tions, unions, refinement types and in- 
dexed types (Dunfield 2007), extended with support for (first-class) 
polymorphism (Dunfield 2009). We do not yet support all these 
features; support for first-class polymorphism looks hardest, since 
Standard ML compilers cannot even handle higher-rank predica- 
tive polymorphism. Elaborating programs that use ML-style prenex 
polymorphism should work, but we currently lack any proof or 
even significant testing to back that up. 

Our implementation does currently support merges, intersec- 
tions and unions, a top type, a bottom (empty) type, single-field 
records and encoded multi-field records (Section 18.21 , and induc- 
tive datatypes (if their constructors are not of intersection type, 
though they can take intersections and unions as argument; remov- 
ing this restriction is a high priority). 

9.1 Bidirectional Typechecking 

Our implementation uses bidirectional typechecking (Pierce and 
Turner l2000t iDunfield and Pfenning! l2004t IDunfieldl 120091) . an 
increasin g ly co mmon technique in advanced type systems; see 
IDunfieldl d2009T) for references. This technique offers two major 
benefits over Damas-Milner type inference: it works for many 
type systems where annotation-free inference is undecidable, and 
it seems to produce more localized error messages. 

Bidirectional typechecking does need more type annotations. 
Howev er, by following the approach of Dunfield a nd Pfenning! 
(2004), annotations are never needed except on redexes. The 
present implementation allows some annotations on redexes to be 
omitted as well. 

The basic idea of bidirectional typechecking is to separate the 
activity of checking an expression against a known type from the 
activity of synthesizing a type from the expression itself: 

F h e A e checks against known type A 
r h c=> A e synthesizes type A 

In the checking judgment, P, e and A are inputs to the typing al- 
gorithm, which either succeeds or fails. In the synthesis judgment, 
r and e are inputs and A is output (assuming synthesis does not 
fail). 

Syntactically speaking, crafting a bidirectional type system 
from a type assignment system (like the one in Figure |4) is a mat- 
ter of taking the colons in the Y h e : A judgments, and replacing 
some with "<^=" and so me with "=^>". Except for merge,. , our typing 
rules can all be found in Dunfield and Pfenning (2004), who argued 
that introduction rules should check and elimination rules should 
syn thesize. (Param etric polymorphism muddies this picture, but 
see|Dunfield (2009J) for an approach used by our implementation.) 
For functions, this leads to the bidirectional rules 



F, x : A h e 



B 



T h Ax. e 



->I 



T h e 2 



— >E 



r h c, e 2 4 B 

The merge rule, however, neither introduces nor eliminates . We 
implement the obvious checking rule (which, in practice, always 
tries to check against e, and, if that fails, against e 2 ): 

r h e k j= A 

r h e,„e 2 4= A 

Since it can be inconvenient to annotate merges, we also implement 
synthesis rules, including one that can synthesize an intersection. 

rhe k 4A rhei^A, V h e 2 A 2 



r h ei „ e 2 =$> A 



T h e,„e 2 4A,AA 2 
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Given a bidirectional typing derivation, it is generally easy to 
show that a corresponding type assignment exists: replace all 
and "4=" with ":" (and erase explicit type annotations from the 
expression). 

9.2 Performance 

Intersection typechecking is PSPACE-hard (Revnolds 1993). m 
practice, we elaborate the examples in Figures [10] QT| and [12] in 
less than a second, but they are very small. On somew hat larger 
examples, such as those discussed by Dunfield (2007), the non- 
elaborating version of Stardust could take minutes, thanks to heavy 
use of backtracking search (trying AEi then AE2, etc.) and the 
need to check the same expression against different types (AI) 
or with different assumptions (VE). Elaboration doesn't help with 
this, but it shouldn't hurt by more than a constant factor: the shapes 
of the derivations and the labour of backtracking remain the same. 

To scale the approach to larger programs, we will need to 
consider how to efficiently represent elaborated intersections and 
unions. Like the theoretical development, the implementation has 
2- way intersection and union types, so the type Ai A A2 A A3 is 
parsed as (Ai A A2) A A3, which becomes (Ai *Ai) *A3. A flat- 
tened representation Ai * A2 * A3 would be more efficient, except 
when the program uses values of type (Ai A A2) A A3 where val- 
ues of type Ai A A2 are expected; in that case, nesting the product 
allows the inner pair to be passed directly with no reboxing. Sym- 
metry is also likely to be an issue: passing v : Ai A A2 where 
v : A2 A Ai is expected requires building a new pair. Here, it may 
be helpful to put the components of intersections into a canonical 
order. 

The foregoing applies to unions as well — introducing a value of 
a three-way union may require two injections, and so on. 

10. Related Work 

Inte rsections were o riginally develo ped by | Cpppo et all (|1981|) 
and iPottingej (1980), among others; iHindlevI (I1992F gives a use- 
ful in troduction and bibliogr a phy. Work on un i on ty pes began 
later (fMacOuee n et al.|[l98 6): Barba nera et all (Il995h is a key 
paper on type assignment for unions. 

Forsy the. In the late 1980fl Reynolds invented Forsythe dRevnoldsl 
1996), the first practical programming language based on intersec- 
tion types. In addition to an unmarked introduction rule like AI, the 
Forsythe type system includes rules for typing a construc t pi ,P2 — 
"a co nstruction for intersecting or 'merging' meanings" (Reynolds 
1996, p. 24). Roughly analogous to ei ,, e2, this construct is used to 
encode a variety of features, but can only be used unambiguously. 
For instance, a record and a function can be merged, but two func- 
tions cannot (actually they can, but the second phrase P2 overrides 
the first). Forsythe does not have union types. 

The }\&-calculus. Casta gna et alj dl995h developed the A&- 
calculus, which has &-terms — functions whose body is a merge, 
and whose type is an intersection of arrows. In their semantics, ap- 
plying a &-term to some argument reduces the term to the branch 
of the merge with the smallest (compatible) domain. Suppose we 
have a &-term with two branches, one of type nat — > nat and one 
of type pos — > pos. Applying that &-term to a value of type pos 
steps to the second branch, because its domain pos is (strictly) a 
subtype of nat. 

Despite the presence of a merge-like construct, their work on the 
A&-calculus is markedly different from ours: it gives a semantics 
to programs directly, and uses type information to do so, whereas 
we elaborate to a standard term language with no runtime type 

3 The citation year 1996 i s the date of the r evised description of Forsythe; 
the core ideas are found in Reynolds ( 1988). 



information. In their work, terms have both compile-time types 
and run-time types (the run-time types become more precise as 
the computation continues); the semantics of applying a &-term 
depends on the run-time type of the argument to choose the branch. 
The choice of the smallest compatible domain is consistent with 
notions of inheritance in object-oriented programming, where a 
class can override the methods of its parent. 

Seman tic subtyping. Following the A&-calculus, iFrisch et al.l 
( 2008) investigated a notion of purely semantic subtyping, where 
the definition of subtyping arises from a model of types, as op- 
posed to the syntactic approach used in our system. They support 
intersections, unions, function spaces and even complement. Their 
language includes a dynamic type dispatch which, very roughly, 
combines a merge with a generalization of our union elimination. 
Again, the semantics relies on run-time type information. 

Pierce's work. The earliest refere nce I know fo r the idea of 
compiling intersection to product is Piercej (Il991bl) : "a language 
with intersection types might even provide two different object- 
code sequences for the two versions of + [for int and for real]" (p. 
11). Pierce also developed a language with union t ypes, includin g 
a term-level construct to explicitly eliminate them (Pierce 1 99 1 al) . 
But this construct is only a marker for where to eliminate the union: 
it has only one branch, so the same term must typecheck under each 
assumption. Another difference is that this construct is the only way 
to eliminate a union type in his system, whereas our VE is marker- 
free. Intersections, also present in his language, have no explicit 
introduction construct; the introduction rule is like our AI. 

Flow types. iTurbak et aO dl997l) and lWells et~aH d2002l) use inter- 
sections in a system with flow types. They produce programs with 
virtual tuples and virtual sums, which correspond to the tuples and 
sums we produce by elaboration. However, these constructs are in- 
ternal: nothing in their work corresponds to our explicit intersection 
and union term constructors, since their system is only intended to 
capture existing flow properties. They do not compile the virtual 
constructs into the ordinary ones. 

Heterogeneous data and dynamic typing. Several approaches 
to combining dynamic typing's transparency and static typing's 
guarantees have been investigated. Soft typing (Cartwright and 
Fagan 1 19911 : 1 Aiken et al] 1 19941) adds a kind of type inference 
on top of dyn amic typing, but provides no ironcla d guarantees. 
Typed Scheme ( Tobin-Hochstadt and Felleisen 2008), developed to 
retroactively type Scheme programs, has a flow-sensitive type sys- 
tem with union types, directly supporting heterogeneous data in the 
style of Section [831 Unlike soft typing, Typed Scheme guarantees 
type safety and provides genuine (even first-class) polymorphism, 
though programmers are expected to provide some annotations. 

Type refinements. Restricting intersections and unions to refine- 
ments of a single base type simplifies many issues, and is conserva- 
tive: programs can be checked against refined types, then compiled 
normally. This approach has been explored for intersections (Free- 
man and Pfenning 19 9ltlDavies and Pf enning 2000), and for inter- 
sections and unions I Dunfi eld and Pfen ning 2003, 2004). 

11. Conclusion 

We have laid a simple yet powerful foundation for compiling unre- 
stricted intersections and unions: elaboration into a standard func- 
tional language. Rather than trying to directly understand the be- 
haviours of source programs, we describe them via their consis- 
tency with the target programs. 

The most immediate challenge is coherence: While our elabora- 
tion approach guarantees type safety of the compiled program, the 
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meaning of the compiled program depends on the particular elab- 
oration typing derivation used; the meaning of the source program 
is actually implementation-defined. 

One possible solution is to restrict typing of merges so that 
a merge has type A only if exactly one branch has type A. We 
could also partially revive the value restriction, giving non-values 
intersection type only if (to a conservative approximation) both 
components of the intersection are provably disjoint, in the sense 
that no merge-free expression has both types. 

Another challenge is to reconcile, in spirit and form, the un- 
restricted view of intersections and unions of this paper with the 
refinement approach. Elaborating a refinement intersection like 
(pos — ) neg) A (neg — > pos) to a pair of functions seems 
pointless (unless it can somehow facilitate optimizations in the 
compiler). It will probably be necessary to have "refinement" and 
"unrestricted" versions of the intersection and union type construc- 
tors, at least during elaboration; it may be feasible to hide this 
distinction at the source level. 
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A. Guide to the Twelf development 

This is the PDF part of the auxiliary material to the ICFP 2012 submission, "Elaborating Intersection and Union Types". The rest of the 
auxiliary material is Twelf code, and is available on the web: 

nttp : //www . cs . emu. edu/~ joshuad/intcomp. tar tar archive 
http://www.cs.cmu.edu/~joshuad/intcomp/ browsable files 

We give an overview and briefly describe each file (mapping back to the paper). 
A.l Overview 

All the lemmas and theorems in the paper were proved in Twelf (version 1.7.1). The only caveat is that, to avoid the tedium of using 
nontrivial induction measures (Twelf only knows about subterm ordering), we use the "/.trustme directive to define pacify, yielding a 
blatantly unsound induction measure; see b ase, elf] All uses of this unsound measure can be found with 

grep pacify *.elf 

You can easily verify that in each case where pacify is used, the real inductive object is smaller according to either the standard depth 
(maximum path length) or weight (number of constructors, i.e. number of inference rules used) measures. 

In any case, you will need to set the unsafe flag to permit the use of 7,trustme in the definition of pacify. 

A.2 Files 

• \base. elf] Generic definitions not specific to this paper. 

• syntax, elf : Source expressions exp, target terms tm, and types ty, covering much of Figures|2]|5] and|8] 

• \is— value. elf\ Which source expressions are values (Figure|2}. 

• \eva l- contexts -elf\ Evaluation contexts (Figure[2). 

• \is- valuetm. elf Which target terms are values (Figure|5j. 

• type of. elf\ A system of rules for a version of V h e : A without subtyping. This system is related to the one in Figure|4]by Theorem 
[TJ( coerce, elf ). 

• type of +sub . elf] The rules for F h e : A (Figure |4). Also defines subtyping sub A B Coe CoeTyping, corresponding to 
A < B «— > Coe. In the Twelf development, this judgment carries its own typing derivation (in the typeof . elf system, without 
subtyping) CoeTyping, which shows that the coercion Coe is well-typed. 

• \sub-refl . elf and sub-tran s . elf\ Reflexivity and transitivity of subtyping. 

• coe rce. elf\ Theorem[TJ Given an expression well-typed in the system of typeof +su b . elf] with full subsumption, coercions for 
function types can be inserted to yield an expression well-typed in the system of typeof . elf Getting rid of subsumption makes the rest 
of the development easier. 

• \elab. elf] Elaboration rules deriving r h e : A <— > M from Figure [9] 

• typeof -elab . elf . Theorems [4] and [5] 

• \typeoftm. elf The typing rules deriving G h M : T from Figure [6] 

• \elab-type-soundnes sTelf [ Theorem[6] 

• |step. elf] Stepping rules e e' (Figure|3}. 

• \step-eval-context . elf] Lemma[7](stepping subexpressions in evaluation position). 

• steptm. elf : Stepping rules M i— > M' (Figure|7}. 

• tm-safety . elf Theorems [2] and [3] (target type safety and determinism). 

• ^elab-union. elf\ \elab-sect.~e lf e lab -arr . e If [ Inversion properties of elaboration for V, A and — > (Lemmas[8][9] andllOt. 

• value-mono . e If Value monotonicity of elaboration (Lemma [Til. 

• | con sist ency . ~elf] The main consistency result (Theorem 1 13) and its multi-step version (Theoremll4). 

• summary . elf\ Theorems 1 1 5 1 and [Tol which are corollaries of earlier theorems. 
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